Introduction

Data Description:

All of the data sourced for this project is from the Spotify API. There are two main sub categories of data from the Spotify API that we use: the Daily Top 200 Charts and Spotify Track Features.

  1. Daily Top 200 Charts
    (Spotify API)

The Daily Top 200 Charts shows the top \(200\) most streamed tracks each day from January 1, 2018 to December 31, 2021. For example, Spotify Daily Top Songs USA shows the daily update of the most played tracks across the US right now. The variables included in this dataset are rank, uri, artist_names, track_name, source, peak_rank, previous_rank, days_on_chart, streams.

 

  1. Spotify Track Features
    (Spotify API)

The Spotify Track Features dataset shows audio features for each track streamed. A full list of these, along with their verbal definitions, can be found on Spotify’s page for developers. There are 12 audio features for each track, including confidence measures acousticness, instrumentalness, liveness, speechiness; perceptual measures danceability, energy, loudness, valence; and descriptors key, duration, mode, tempo.

 

Final Data

We merged the two data sources into a single data frame with the combination of being the unique identifier of an observation.

Variable Type Description
date Categorical, date Date of the spotify chart
track_id Categorical, str Unique identifier for each track
track_name Categorical, str Title of the track
all_artists Categorical, str List of all artist names that appeared on the track
main_artist Categorical, str Name of the main artist
main_artist_id Categorical, str Unique identifier for each artist
rank Quantitative, int Rank from 1-200 (1 is the most streamed track that day)
streams Quantitative, int Total number of global streams that day
acousticness Quantitative, float Confidence measure of sound through acoustic (1.0 is the most acoustic)
danceability Quantitative, float Dance friendly measurement (1.0 is most danceable)
energy Quantitative, float Perceptual measure of intensity and activity
instrumentalness Quantitative, float Variety of instruments appeared
key Categorical, int Overall key of the track, sets of sharp or flat
liveness Quantitative, float Detection of whether a track was peformed live with an audience
loudness Quantitative, float Overall loudness of a track in decibels (dB)
mode Categorical, int Modality (major or minor) of a track, the type of scale
speechiness Quantitative, float Measures the number of spoken words
tempo Categorical, int Estimated tempo of a track in beats per minute (BPM)
valence Quantitative, float Measure from 0.0 to 1.0 describing the musical positiveness
duration Quantitative, int Duration of track in milliseconds
explicit Categorical, boolean True or false if contains explicit content
genre Categorical, str Name of the genre associated with that track

We do not have a genre for each song. So, we are utilizing the genres the artist belongs to and using those genres for each song. We want to discuss this with you in the meeting which we will be scheduling this week.


Questions/Hypotheses:

    1. Does time period change the popularity (proportion of streams) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      1. Does whether or not it is a weekday change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      2. Does whether or not it is during the holiday season change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      3. Does what meteorological season it is change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
    2. Did the popularity of happy songs (mean valence) in the top 200 Spotify US daily streams change during Covid?
    3. What parameters are the most important in predicting the popularity on Spotify in the US?

Analysis:

Question 1.

  1. Does time period change the popularity of genres in the Spotify US daily charts?
    (Question 1)

This question explores the proportion of songs of certain genres (pop, rap, hip hop, r&b, and rock) within a certain time-based constraint (weekday vs weekend, the holiday season or not, meteorological season). As the chart above demonstrates, time plays an important part in what genres people listen to. For example, pop has a very strong seasonality component. People have also been listening to more rock, but less r&b. We want to look at smaller periods of time and see what effect they may have on the proportion of genres in Spotify’s Top 200 charts.

Data:

The variable of interest is the proportion of songs of a certain genre within a certain time-based constraint (weekday vs weekend, the holiday season or not, meteorological season). This is calculated by grouping the data by the time-based constraint, and then calculating the proportion by summing the column for the genre being tested and dividing by the number of rows in that group. This works because summing the column is just counting the number of TRUEs in that column which is equivalent to the number of songs with that genre.

While the actual calculation is done based on rows and ignores the track_id, in actuality, this calculation is equivalent to counting the unique songs of the genre and multiplying it by the number of days it has appeared on Spotify’s Top 200 chart (in the relevant time-period).

Part a:

  1. Does whether or not it is a weekday change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?

A weekday is considered to be Monday, Tuesday, Wednesday, or Thursday. A weekend is considered to be Friday, Saturday, or Sunday.

Statistical Methods

The hypothesis tests were performed for each of the following genres: pop, rap, hip hop, r&b, and rock.

  • Null hypothesis: The proportion of songs in the Spotify US Top 200 Daily Charts during weekdays of the genre being tested is equal to the proportion of songs during weekends. \(H_0\colon p_\mathrm{weekday} = p_\mathrm{weekend}\)

  • Alternative hypothesis: The proportion of songs in the Spotify US Top 200 Daily Charts during weekdays of the genre being tested is NOT equal to the proportion of songs during weekends. \(H_1\colon p_\mathrm{weekday} \neq p_\mathrm{weekend}\)

We want to compare the proportion of songs for each genre on weekdays vs weekends. The two samples (weekday and weekend) are independent with independent observations, and the sample sizes \(n_\mathrm{weekday}\) and \(n_\mathrm{weekend}\) are both large.

Under these assumptions, we can use the two-sided two-sample large sample Z-test to compare the proportions for each genre on weekdays vs the weekend. Because the test will be performed on 5 genres, a Bonferroni correction will be applied to the significance level by dividing 0.05 by 5 (the number of genres), resulting in a significance level of = 0.01.

The Z-value is calculated using a pooled sample proportion in the following equation:

\[ \begin{align} Z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p} \left(1- \hat{p} \right) \left(1/n_A + 1/n_B \right)}}, && \hat{p} = \frac{n_A \, \hat{p}_A + n_B \, \hat{p}_B}{n_A + n_B} \end{align} \]

Results

Based on the p-values produced using a significance level of 0.01, we have evidence to reject the null hypotheses that the proportion of songs by genre is the same on weekdays and the weekend for all the genres tested (pop, rap, hip hop, r&b, and rock).

Looking at the bar chart, however, it is difficult to visually see much difference in the proportions for any genre. This demonstrates that although there is a statistically significant difference, the difference itself is not particularly strong. Interestingly, pop and r&b songs have a higher proportion during the week, whereas rap, hip hop, and rock have a higher proportion during the weekend.


Part b:

  1. Does whether or not it is during the holiday season change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?

The holiday season is considered to be the day after. Thanksgiving through December 31st.

Statistical Methods

The hypothesis tests were performed for each of the following genres: pop, rap, hip hop, r&b, and rock.

  • Null hypothesis: The proportion of songs in the Spotify US Top 200 Daily Charts during the holiday season of the genre being tested is equal to the proportion of songs during the rest of the year. \(H_0 \colon p_\mathrm{holiday} = p_\mathrm{not \, holiday}\)

  • Alternative hypothesis: The proportion of songs in the Spotify US Top 200 Daily Charts during the holiday season of the genre being tested is NOT equal to the proportion of songs during the rest of the year. \(H_1 \colon p_\mathrm{holiday} \neq p_\mathrm{not \, holiday}\)

We want to compare the proportion of songs for each genre during the holiday season and otherwise. The two samples (holiday and not_holiday) are independent with independent observations, and the sample sizes \(n_\mathrm{holiday}\) and \(n_\mathrm{not \, holiday}\) are both large.

Under these assumptions, we can use the two-sided two-sample large sample Z-test to compare the proportions for each genre during the holiday season and otherwise. Because the test will be performed on 5 genres, a Bonferroni correction will be applied to the significance level by dividing 0.05 by 5 (the number of genres), resulting in a significance level of \(\alpha = 0.01\).

The Z-value is calculated using a pooled sample proportion in the following equation:

\[ \begin{align} Z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p} \left(1- \hat{p} \right) \left(1/n_A + 1/n_B \right)}}, && \hat{p} = \frac{n_A \, \hat{p}_A + n_B \, \hat{p}_B}{n_A + n_B} \end{align} \]

Results

Based on the p-values produced using a significance level of 0.01, we have evidence to reject the null hypotheses that the proportion of songs by genre is the same during the holiday season and otherwise for all the genres tested (pop, rap, hip hop, r&b, and rock).

This difference is also much more apparent in the bar chart as compared to the bar chart for subquestion A comparing weekday and weekend, meaning the difference is not only statistically significant, but also strong for a majority of the genres tested. Another interesting observation is that of the 5 genres tested, only rock has a higher proportion during the holiday season than outside of it.


Part c:

  1. Does what meteorological season it is change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?

Spring is considered to be March, April, and May. Summer is considered to be June, July, and August. Fall is considered to be September, October, and November. Winter is considered to be December, January, and February.

Statistical Methods

The hypothesis tests were performed for each of the following genres: pop, rap, hip hop, r&b, and rock

  • Null hypothesis: There is no difference in the proportion of songs in the Spotify US Top 200 Daily Charts of the genre being tested between the different seasons. \(H_0 \colon p_\mathrm{spring} = p_\mathrm{summer} = p_\mathrm{fall} = p_\mathrm{winter}\)

Alternative hypothesis: There is a difference in the proportion of songs in the Spotify US Top 200 Daily Charts of the genre being tested between at least two of the seasons. \(H_1 \colon p_\mathrm{season \, 1} \neq p_\mathrm{season \, 2}\)

We want to compare the proportion of songs for each genre during each season. The four samples (spring, summer, fall, winter) are independent with independent observations, and the sample sizes are all large.

Under these assumptions, we can use the chi-squared test (which is the equivalent to the two-sided two-sample large sample Z-test, except it can handle more than 2 samples) to compare the proportions for each genre during each season. Because the test will be performed on 5 genres, a Bonferroni correction will be applied to the significance level by dividing 0.05 by 5 (the number of genres), resulting in a significance level of = 0.01.

The p-value was calculated using the chisq.test function with ‘correct’ set to false. The data input into this function looks similar to the following table (without the season column):

Results

Based on the p-values produced using a significance level of 0.01, we have evidence to reject the null hypotheses that the proportion of songs by genre is the same regardless of season for all the genres tested (pop, rap, hip hop, r&b, and rock).

Some interesting observations from the bar chart include that rap seems to be much lower in the winter, and summer and spring appear to be the most similar in terms of proportion of genre.


Question 2.

  1. Did the popularity of happy songs in the top 200 Spotify charts change during Covid?
    (Question 2)

Calculated Variables:

We use the Spotify Daily Top Tracks as described in the Data Description above. Particularly, question 2 makes use of the track_name, valence, and Date columns. Based on these columns, we created a covid variable to define whether a track entry was added to the top 200 playlist before Covid (Date < “03/13/2020”) or after Covid (Date >= “03/13/2020”). Tracks added before Covid (03/13/2020) are labeled before whereas tracks added after Covid are labeled after.

track valence date covid
SUGAR 0.516 2019-08-25 before
Open Letter 0.360 2018-10-02 before
Santa Baby (with Henri René & His Orchestra) 0.490 2018-11-29 before
Body (feat. brando) 0.582 2018-10-08 before
Bohemian Rhapsody - Remastered 2011 0.227 2019-02-26 before
Peaches (feat. Daniel Caesar & Giveon) 0.488 2021-08-06 after
Plug Walk 0.158 2019-03-25 before
I Get the Bag (feat. Migos) 0.425 2018-04-10 before
Peaches (feat. Daniel Caesar & Giveon) 0.488 2021-08-25 after
Stir Fry 0.498 2018-03-20 before

Statistical Method:

We want to compare mean valence values before and after Covid. For the data, the two samples (before and after) are independent with independent observations, and the sample sizes \(n_{\mathrm{before}}\) and \(n_{\mathrm{after}}\) are both large. Under these assumptions, we can use the two-sided two-sample large sample Z-test to compare the mean valences before and after Covid. This means that we compare the Z test statistic to the standard normal distribution.

We test the null hypothesis of no difference between valence means, i.e., we define the null hypothesis as \(H_0 : \mu_{\mathrm{before}} = \mu_{\mathrm{after}}\) versus the 2-sided alternative hypothesis \(H_1 : \mu_{\mathrm{before}} \neq \mu_{\mathrm{after}}\), where \(\mu_{\mathrm{before}}\) and \(\mu_{\mathrm{after}}\) are the mean valences per song for songs added before and after Covid, respectively. The test is conducted at a significance level of \(\alpha = 0.05\).

Results:

First, we calculate the mean (\(\mu_{\mathrm{before}}\), \(\mu_{\mathrm{after}}\)), standard deviation (\(s_{\mathrm{before}}\), \(s_{\mathrm{after}}\)), and size (\(n_{\mathrm{before}}\), \(n_{\mathrm{after}}\)) for each of the two samples.

m = with(df, tapply(valence, covid, mean))
s = with(df, tapply(valence, covid, sd))
n = with(df, tapply(valence, covid, length))
mean std dev size
before 0.4572265 0.2014658 1117863
after 0.4826522 0.2272466 592944

Using these values, we can then calculate the test statistic:

\[ \begin{align} Z & = \frac{\left|\bar{X}_\mathrm{before} - \bar{X}_\mathrm{after}\right|}{\left. s^2_\mathrm{before} {\bf\large /} n_\mathrm{before}\right. + \left. s^2_\mathrm{after} {\bf\large /} n_\mathrm{after} \right.} = \frac{\left|0.4572 - 0.4826 \right|}{\sqrt{0.2015^2 / 1117863 + 0.2273^2/592944}} = 72.379 \end{align} \]

Next, we calculate the p-value using the standard normal distribution (since \(n_\mathrm{before}\) and \(n_\mathrm{after}\) are both large). The p-value is a probability about the test statistic, calculated under the assumption that the null hypothesis is true.

  • If the p-value is less than \(\alpha\) (i.e., \(p \lt 0.05\)), then we reject the null hypothesis of equal means.
    • This would mean that the mean valences per song before and after Covid are not equal (i.e., the popularity of happy songs changed during Covid).
  • If the p-value is greater than \(\alpha\) (i.e., \(p \gt 0.05\)), then we do not reject the null hypothesis of equal means.
    • This would mean that the mean valences per song before and after Covid are equal (i.e., the popularity of happy songs did not change during Covid).
z = (m[1] - m[2] - 0) / sqrt(sum(s^2 / n))
p = 2 * (1 - pnorm(z))

The p-value for the test is \(p \lt 0.001\). Based on the test, we reject the null hypothesis of equal valence means at the \(0.05\) level of significance.

We can also calculate the confidence interval for the difference between population means. A confidence interval provides additional information beyond the hypothesis test. In general, we can interpret a confidence interval as the set of all values of the population parameter that would not have been rejected by the corresponding hypothesis test. Hence, we can state the null hypothesis as \(H_0: \mu_\mathrm{before} - \mu_\mathrm{after} = 0\) and then check whether the confidence interval contains the value \(0\).

se = sqrt(s[1]^ 2 / n[1] + s[2]^ 2 / n[2])
z.05 = qnorm(0.975)
lower = m[1] - m[2] - z.05 * se
upper = m[1] - m[2] + z.05 * se

The confidence interval for the difference between population means is \((0.0247, 0.0261)\), which is very similar to the result from the large-sample procedure. Hence, since the interval does not contain the value \(0\), we would reject the null hypothesis \(H_0: \mu_\mathrm{before} - \mu_\mathrm{after} = 0\) and conclude that the mean valences per song before and after Covid are different.


Question 3.

  1. What parameters are the most important in predicting the popularity on Spotify in the US?
    (Question 3)

Calculated Variables:

For this question we used the Spotify Daily Top Tracks data described in the dataset section. We then aggregated the data by song id, so each song has its own row. Each song has the same values for each attribute so we take the mean of these attributes. These attributes include explicit, acousticness, danceability, duration, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo and valence. Then, we created a new variable for popularity and named it updated_rank. To calculate the updated_rank we do 201 - Rank (i.e. rank 1 has score 200, rank 200 has score 1) so a higher updated_rank means the song is performing better in the ranks. We then sum this updated_rank for each song to get a popularity score. This way the longer the song is in the Spotify top 200 the higher the popularity score.

Statistical Methods

To answer this question we created three linear regression models and looked at the significance of each independent variable in each model. For each independent variable we look at the p-value for the t-test H0: x=0 and Ha: x0 for each variable,x, in the linear regression model. First we started by looking at the correlation matrix.

From the above correlation matrix graph we can see that streams and updated_rank are highly correlated. This makes sense since the rank on spotify is based on the number of streams. We will remove this variable when using linear regression as it is used in the formula to predict popularity and thus not an accurate predictor variable. Other than the total number of streams we can see that there is little to no correlation between the other variables and the updated_rank.

We then continue on to make three linear regression models. Each model is described below:

  • Model1: Simple additive model using all predictor variables in the correlation matrix above
  • Model2: Model created using backwards AIC and all predictor variables in the correlation matrix above as well as all the interaction terms
  • Model3: Model created using backwards AIC and all predictor variables in the correlation matrix above as well as all the interaction terms and taking log base 10 of the updated_rank

Below are the coefficient estimates, p-values and diagnostic plots for each model.

Model 1: Adjusted R-squared: 0.01226

Model 2: Adjusted R-squared: 0.01749

Model 3: Adjusted R-squared: 0.3083

Results

We found that all three models were not great at predicting the popularity of a song. The adjusted R-squared for all three models were very low. Some of the predictor variables were significant, but the coefficients were too small to have a strong impact on updated_rank.

Another important thing to note is that none of the models meet the assumptions for linear regression. Constant variance does not hold as all the fitted vs. residuals plots are not evenly distributed around x = 0. Linearity does not hold as the mean in the fitted vs. residuals plot does not look close to 0 and normality is violated as the q-q plots do not follow a straight line. All three models are unreliable.

However, we look at the commonalities between the models to answer our question. In all three models danceability, energy and loudness or an interaction term containing one of these variables are statistically significant. Although the models do not meet the requirements needed for linear regression and they are not very accurate, since all three models show significance for the above variables we will cautiously conclude that danceability, energy and loudness are the most influential factors when modeling the popularity of a song on the spotify top 200.


Discussion


References